gh-109593: ResourceTracker.ensure_running() calls finalizers #109620

vstinner · 2023-09-20T16:30:58Z

multiprocessing: Reduce the risk of reentrant calls to ResourceTracker.ensure_running() by running explicitly all finalizers before acquiring the ResourceTracker lock.

Issue: test_multiprocessing_forkserver.test_processes: ResourceTracker.ensure_running() calls itself and hangs indirectly via the GC #109593

pitrou · 2023-09-20T16:39:05Z

Hmm, but _run_finalizers should only be run at process end.

vstinner · 2023-09-20T16:48:49Z

Oh, test_asyncio.test_events fails with this change: test_get_event_loop_new_process() fails.

vstinner · 2023-09-20T16:50:37Z

Hmm, but _run_finalizers should only be run at process end.

What is a process end?

pitrou · 2023-09-20T16:51:09Z

Sorry, I meant interpreter shutdown.

vstinner · 2023-09-20T16:59:13Z

Hmm, but _run_finalizers should only be run at process end.

Oh in fact, what I want is to trigger a GC collection: I fixed my PR for that.

I confirm that my change fix #109593 (comment) reproducer.

@pitrou: Would you mind to review this updated fix?

multiprocessing: Reduce the risk of reentrant calls to ResourceTracker.ensure_running() by running explicitly a garbage collection, to call pending finalizers, before acquiring the ResourceTracker lock.

pitrou · 2023-09-20T17:07:02Z

Ok, this is better, but the problem is that gc.collect is slow and ensure_running is called indirectly when instantiation a multiprocessing lock (see SemLock.__init__).

Instantiating a semaphore is currently 500x faster than a GC collection, and that's an optimistic measurement without a lot of objects allocated:

>>> %timeit gc.collect()
10.7 ms ± 352 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit mp.Semaphore()
21.7 µs ± 395 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

vstinner · 2023-09-20T17:20:50Z

Ok, this is better, but the problem is that gc.collect is slow and ensure_running is called indirectly when instantiation a multiprocessing lock (see SemLock.init).

Do you want to propose a different fix?

My concern is that currently, multiprocessing can hang randomly. IMO it's bad and must be fixed. I prefer a slow multiprocessing than a multiprocessing which hangs randomly.

Also, I have limited interest in developing the most efficient fix. So if you have free cycles, please go ahead :-)

These days, I'm busy fixing tons of buildbot failures:

Obviously, I would be fine with a fast and correct fix for this issue :-)

pitrou · 2023-09-20T18:02:38Z

Well, TBH, I'm not sure this issue is very common as I don't think I have every seen it elsewhere.
It does deserve fixing, but is not that urgent either (at least not users of Python :-)).

But, yes, I'll try to come up with a fix.

vstinner · 2023-09-20T19:48:32Z

Well, TBH, I'm not sure this issue is very common as I don't think I have every seen it elsewhere.

It makes fail more and more buildbots, so for me, it's an urgency.

What I mean in my previous message is that I'm considering to fix the issue right now, and we can have time later to revisit the issue and find a better fix (with lower impact on performance).

pitrou · 2023-09-20T19:51:43Z

This code isn't new, so it's surprising it's failing "more and more"?

vstinner · 2023-09-20T20:05:38Z

This code isn't new, so it's surprising it's failing "more and more"?

I modified the CI recently to stop ignoring silently tests failing randomly (FAILURE then SUCCESS): pass new --fail-rerun option to regrtest. It's likely that this hang was silently ignored by CIs previously.

The affected buildbot "PPC64LE Fedora Stable Refleaks 3.x" is blocking Python releases, it's part of STABLE buildbots.

pitrou · 2023-09-20T20:20:13Z

Well... do you want to undo the change on the failing buildbot until we fix this issue?

pitrou · 2023-09-20T20:48:58Z

Please take a look at alternate PR #109629

vstinner · 2023-09-21T07:53:15Z

do you want to undo the change on the failing buildbot until we fix this issue?

Oh sure, if i don't have the bandwith to fix regressions, i will undo this change, once we listed failing tests. That would be reasonable.

vstinner · 2023-09-21T07:54:01Z

Since @pitrou has a better approach, i convert this change to a draft for now.

vstinner added needs backport to 3.11 only security fixes needs backport to 3.12 only security fixes labels Sep 20, 2023

bedevere-app bot added the awaiting review label Sep 20, 2023

bedevere-app bot mentioned this pull request Sep 20, 2023

test_multiprocessing_forkserver.test_processes: ResourceTracker.ensure_running() calls itself and hangs indirectly via the GC #109593

Closed

vstinner force-pushed the mp_ensure_running branch from 339467e to c1dad09 Compare September 20, 2023 16:58

pythongh-109593: ResourceTracker.ensure_running() calls finalizers

d11bc95

multiprocessing: Reduce the risk of reentrant calls to ResourceTracker.ensure_running() by running explicitly a garbage collection, to call pending finalizers, before acquiring the ResourceTracker lock.

vstinner force-pushed the mp_ensure_running branch from c1dad09 to d11bc95 Compare September 20, 2023 17:00

vstinner marked this pull request as draft September 21, 2023 07:53

bedevere-app bot removed the awaiting review label Sep 21, 2023

vstinner closed this Oct 3, 2023

vstinner deleted the mp_ensure_running branch October 3, 2023 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-109593: ResourceTracker.ensure_running() calls finalizers #109620

gh-109593: ResourceTracker.ensure_running() calls finalizers #109620

vstinner commented Sep 20, 2023 •

edited by bedevere-app bot

Loading

pitrou commented Sep 20, 2023

vstinner commented Sep 20, 2023

vstinner commented Sep 20, 2023

pitrou commented Sep 20, 2023

vstinner commented Sep 20, 2023

pitrou commented Sep 20, 2023

vstinner commented Sep 20, 2023

pitrou commented Sep 20, 2023

vstinner commented Sep 20, 2023

pitrou commented Sep 20, 2023

vstinner commented Sep 20, 2023

pitrou commented Sep 20, 2023

pitrou commented Sep 20, 2023 •

edited

Loading

vstinner commented Sep 21, 2023

vstinner commented Sep 21, 2023

gh-109593: ResourceTracker.ensure_running() calls finalizers #109620

gh-109593: ResourceTracker.ensure_running() calls finalizers #109620

Conversation

vstinner commented Sep 20, 2023 • edited by bedevere-app bot Loading

pitrou commented Sep 20, 2023

vstinner commented Sep 20, 2023

vstinner commented Sep 20, 2023

pitrou commented Sep 20, 2023

vstinner commented Sep 20, 2023

pitrou commented Sep 20, 2023

vstinner commented Sep 20, 2023

pitrou commented Sep 20, 2023

vstinner commented Sep 20, 2023

pitrou commented Sep 20, 2023

vstinner commented Sep 20, 2023

pitrou commented Sep 20, 2023

pitrou commented Sep 20, 2023 • edited Loading

vstinner commented Sep 21, 2023

vstinner commented Sep 21, 2023

vstinner commented Sep 20, 2023 •

edited by bedevere-app bot

Loading

pitrou commented Sep 20, 2023 •

edited

Loading